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Bi-Sparsity  Pursuit:  A  Paradigm  for  Robust 

Subspace  Recovery 

Xiao  Bian,  Student  Member,  IEEE,  and  Hamid  Krim,  Fellow,  IEEE 


Abstract 

The  success  of  sparse  models  in  computer  vision  and  machine  learning  is  due  to  the  fact  (hat,  high  dimensional  data  is 
distributed  in  a  union  of  low  dimensional  subspaces  in  many  real-world  applications.  The  underlying  structure  may.  however,  be 
adversely  alfected  by  sparse  errors.  In  this  paper  we  propose  a  bi-sparse  model  as  a  framework  to  analyze  this  problem,  and 
provide  a  novel  algorithm  to  recover  the  union  of  subspaces  in  presence  of  sparse  corruptions.  We  further  show  the  effectiveness 
of  our  method  in  a  number  of  applications  using  real-world  vision  data. 

Index  Terms 


Signal  recovery.  Sparse  learning.  Subspace  modeling 


L  Introduction 

Separating  data  from  errors  and  noise  has  always  been  a  critical  and  important  problem  in  .signal  proce.ssing,  computer  vision 
and  data  mining  [4].  Robust  principal  component  pursuit  is  particularly  successful  in  recovering  low  dimensional  structures 
of  high  dimensional  data  from  arbitrary  sparse  errors  [2].  Successful  applications  of  sparse  models  in  computer  vision  and 
machine  learning  [5J  [17]  have,  however,  increasingly  hinted  at  a  more  general  model,  namely  thal  ihe  underlying  structure 
of  high  dimensional  data  looks  more  like  a  union  of  suhspaces  (UoS)  ralher  than  one  low  dimensional  suhspace.  A  natural 
question  is  therefore  about  the  feasibility  of  such  an  approach  in  high  dimensional  data  modeling  where  the  union  of  subspaces 
is  further  impacted  by  sparse  errors.  This  problem  is  imrinsically  difficult,  since  the  underlying  subspace  structure  is  also 
corrupted  by  unknown  errors,  which  may  lead  to  unreliable  measurement  of  distance  among  data  samples,  and  make  daia 
deviate  from  the  original  subspaces. 

Recent  studies  on  subspace  clustering  [13]  [7]  [19]  show  a  particularly  interesting  and  a  promising  potential  of  sparse 
models.  In  [13],  a  low-rank  representation  (LRR)  recovers  subspace  siruciures  from  sample-specific  corruptions  by  jointly 
pursuing  ihe  lowest-rank  representation  of  all  data.  The  contaminated  samples  are  sparse  among  all  sampled  data.  The  sum 
of  column-wise  norm  is  applied  to  identify  the  sparse  columns  in  data  mau'ices  as  outliers.  In  [7],  daia  sampled  from  UoS  is 
clustered  using  sparse  representation.  Input  data  can  be  recovered  from  noise  and  sparse  errors  under  the  assumption  that  the 
underlying  subspaces  are  still  well -represented  by  oiher  data  points.  In  [19],  a  stronger  result  is  achieved  such  that  data  may 
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be  recovered  even  when  the  underlying  subspaces  overlap.  Outliers  that  are  sparsely  distributed  among  data  samples  may  be 
identified  as  well. 

In  this  paper,  we  consider  a  more  stringent  condition  that  all  data  samples  may  be  corrupted  by  sparse  errors.  Therefore  the 
UoS  structure  is  generally  damaged  and  no  data  sample  is  close  to  its  original  subspace  under  a  measure  of  Euclidean  metric. 
More  precisely,  the  main  problem  can  be  stated  as  follows: 

Problem  1.  Given  a  set  of  data  samples  X  =  [xj .  X2 , . . . ,  Xn],  find  a  partition  of  X,  such  that  each  part  X /  can  he  decomposed 
into  a  low  dimensional  subspace  ( represented  as  low  rank  matrix  L/ )  and  a  sparse  error  ( represented  as  a  sparse  matrix  E/), 
such  that 

X7=L;+E/.7=1,...,J 

Each  Lf  then  represents  one  low  dimensional  subspace  of  the  original  data  space,  and  L  =  [L1IL2I  . .  |L,/]  the  union  of 
subspaces.  Furthermore,  the  partition  would  recover  the  clustering  structure  of  original  data  samples  hidden  from  the  errors 
E=  [Ei|E2|...|E;|. 

Concretely,  the  goal  of  this  problem  is  twofold:  First,  we  wish  to  discover  the  correct  partition  of  data  so  that  data  subsets 
reside  in  a  low  dimensional  sub  space.  Second,  we  wish  to  recover  each  underlying  subspace  from  the  corrupted  data.  It  is  worth 
noting  that  the  corrupted  data  may  highly  affect  the  partition,  and  hence  decoupling  the  two  tasks  would  be  problematic.  In 
this  paper,  we  propose  an  integral  method  to  decompose  the  given  corrupted  data  matrix  into  two  parts,  representing  the  clean 
data  and  sparse  errors,  respectively.  The  correct  partition  of  data,  as  well  as  the  individual  subspaces,  are  also  simultaneously 
recovered.  Moreover,  we  prove  a  condition  for  the  data  to  be  exactly  recovered  as  the  global  minimum  of  the  proposed 
optimization  problem,  and  provide  an  algorithm  to  approximate  the  global  optimizer,  which  is  henceforth  referred  to  as  Robust 
Subspace  Recovery  via  Bi-Sparsity  Pursuit  (RoSuRe). 

A.  Organization  of  the  paper 

The  remainder  of  this  paper  is  organized  as  follows.  In  Section  II.  we  provide  the  fundamental  concepts  necessary  for  the 
development  of  our  proper  modeling.  Building  on  this  model,  we  reformulate  in  Section  III  Problem  1  as  an  optimization 
problem,  and  develop  the  rationale  along  with  the  condition  for  subspace  recovery.  In  Section  IV.  we  introduce  the  RoSuRe 
algorithm  for  robust  subspace  recovery.  In  Section  V,  we  finally  present  experimental  results  on  synthetic  data  and  real-world 
applications. 

B.  Notation 

A  brief  notational  summary  of  this  paper  is  as  follows:  The  dimension  of  a  m  x  n  mau-ix  X  is  denoted  as  dim{X)  =  (m,n). 
||X||o  denotes  the  number  of  nonzero  elements  in  X,  while  ||X||i  denotes  the  vector  norm.  For  a  matrix  X  and  an  index 
set  J.  we  let  X,/  be  the  submatrix  containing  only  the  columns  of  indices  in  7.  col(X)  denotes  the  column  space  of  matrix 
X.  We  write  Pcia^  3S  the  orthogonal  projection  of  matrix  X  on  the  support  of  A,  and  Piv^^  =  X  —  Pq^X.  The  sparsity 
of  a  m  X  n  matrix  X  is  denoted  by  /5(X)  = 
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II.  Problem  Formulation 

A.  A  union  of  suhspaces  with  corrupted  data 

Consider  a  set  of  data  points  I  €  sampled  from  a  union  of  subspaces  5  =  US^^  with  an  assumed  sufficient  sample 
density,  each  sample  I,  can  be  represented  by  the  others  from  die  same  subspace  5(1^). 

h=  Y. 

Furthermore,  if  we  represent  the  above  relation  in  a  matrix  form  using  L  =  [I1II2I . . .  |ln|  »  then  have 

L  =  LW.W^i  =  0, 


where  W  is  n  x  n  matrix  with  zero  diagonals. 

More  specifically,  let  be  the  number  of  samples  from  5*,  and  (6t,  6t)  the  dimension  of  block  W*  of  W.  then  rii  >  6j.  It 
follows  that  6t  <  maxi{nj}.  This  condition  constrains  W  to  be  a  sparse  matrix,  since  p(W)  =  ||W||o/n^  <  max{6t}/7i  < 
max{7ii}/n.  It  is  worth  noting  that,  to  recover  the  underlying  data  sampled  from  UoS,  it  is  equivalent  to  find  a  matrix  L  and 
W  under  the  above  constraints.  The  space  of  W  can  be  then  defined  as  follows, 

Definition  1.  (k-block-di agonal  matrix)  We  say  that  an  n  x  n  matrix  M  is  k-block-diagonal  if  and  only  if  there  exists  a 
permutation  matrix  P,  such  that  M  =  PMP”^  is  a  block-diagonal  matrix  with  k  diagonal  blocks.  The  .^pace  of  all  such 
matrices  is  denoted  as  BMk- 

We  next  define  the  space  of  mau'ices  whose  columns  reside  in  UoS  based  on  the  space  BMk  of  W. 

Definition  2.  (k-self-representative  matrix).  We  say  that  a  d  x  n  matrix  X  with  no  z^ro  column  is  k-self-repre.sentative  if  and 
only  if 

X  =  XW.W  €  BMk.'Wii  =  0. 

The  space  of  all  such  d  x  n  matrices  is  denoted  by  SRk 

Consider  the  case  where  that  sample  Ij  is  corrupted  by  some  sparse  error  ej.  Intuitively,  we  want  to  separate  the  sparse 
errors  from  die  data  matrix  X  and  associated  with  the  remainder  in  SBk-  Therefore  Problem  1  can  be  formulated  as 

minllEllo  (1) 

.s.i.X  =  L-hE.L€5i?A:. 


We  have  some  fundamental  difficulties  in  solving  this  problem,  on  account  of  the  combinatorial  nature  of  ||  •  ||o  and  the  complex 
geometry  of  SRk-  For  the  former  one,  there  are  established  results  of  using  the  norm  to  approximate  die  sparsity  of  E 
[3J[21].  The  real  dilBculty.  however,  is  that  not  only  SRk  is  a  non-convex  space.’  and  even  worse,  SRk  is  not  path-connected. 
Intuitively,  it  is  helpful  to  consider  Li,L2  €  SRk>  and  let  (:ol(Li)  fi  col(L2)  =  0,  then  all  possible  paths  connecting  Li  and 

‘Consider  Ml. M2  €  SR\,  lei  Mi  =  f  0  0  )  (  2  1  Y  h  ^  +M2)/2  =  ^  1  ^  SRi. 
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1j2  must  pass  through  the  origin,  given  that  L  is  a  matrix  with  no  zero  columns,  and  0  ^  SRk-  SRk  can  hence  be  divided 
into  at  least  two  components  Sp  and  SRk/Sp. 

To  avoid  solving  Eqn(l)  with  a  disconnected  feasible  region,  we  opt  to  integrate  this  constraint  into  the  objective  function, 
and  see  the  problem  from  a  different  angle.  We  hence  propose  the  following  definition: 

Definition  3.  (Wo -function  on  a  matrix  space).  For  any  dxn  matrix  X,  if  there  exists  W  €  such  that  X  =  XW,  then 
Wo(X)  =  imn  II  W||o,  s.t.  X  =  XW,  =  0,  W  €  BM^  for  some  k. 

Otherwise,  Wo(X)  =  oo 

Then  instead  of  Eqn(l),  we  consider  the  following  optimization  problem: 

miiiWo(L)  +  A|lE||o  (2) 

L,E/ 

s.t.X  =  L  +  E. 

The  relation  between  Eqn(l)  and  Eqn(2)  is  established  by  the  following  lemma; 

Lemma  1.  For  certain  i/(L,  E)  is  a  pair  of  global  optimizer  of  Egn(2),  then  (L.E)  is  also  a  global  optimizer  of  Eqn(}}. 
The  proof  of  Lemma  1  is  presented  in  Appendix  A -A. 

Next  we  will  leverage  the  parsimonious  property  of  li  norm  to  approximate  ||  •  ||o.  First,  the  definition  of  Wo(  )  is  extended 
to  a  norm-based  function: 

Definition  4.  (Wi -function  on  a  matrix  space).  For  any  dxn  matrix  X,  if  there  exists  W  €  BMk,  such  that  X 

Wi(X)  =  min||W||,,  s.t.  X  =  XW,Wii  =  0,W  €  BM^  for  some  k. 
w 

Otherwise,  Wi(X)  =  c» 

We  proppse,  as  a  result,  have  the  following  reformulation  of  the  problem, 

imiiWi(L)  +  A||E||,, 

L/.Ej 

s.tX  =  L  -hE 

It  is  worth  noting  that  formulation  Eqn(3)  bears  a  similar  form  to  the  problem  of  robust  PC  A  in  [2].  Intuitively,  both  problems 
attempt  to  decompose  the  data  matrix  into  two  parts;  one  with  a  parsimonious  support,  and  the  other  also  with  a  sparse  support, 
however  in  a  different  domain.  For  robust  PCA.  the  parsimonious  support  of  the  low  rank  mau-ix  lies  in  the  domain  of  singular 
values.  In  our  case,  the  sparse  support  of  L  lies  in  the  mau'ix  W  of  the  Wo  function,  meaning  that  columns  of  L  can  be 
sparsely  self-represented. 


=  XW,  then 


(3) 
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111.  Recovery  of  a  union  of  subspaces 

In  ihis  section,  we  discuss  the  imponani  question  of  when  the  underlying  structure  can  be  exactly  recovered  by  solving 
Eqn(3).  This  problem  is  essentially  twofold:  first,  it  is  about  the  exact  recovery  of  (L,  E);  and  second,  it  is  about  when  W 
correctly  reflects  the  true  UoS  structure. 

A.  A  sufficient  condition  for  exact  recovery 

The  exact  recovery  of  L  and  E  relies  on  the  properties  of  both  matrices.  In  particular,  we  would  expect  these  two  matrices 
be  fundamentally  different  from  each  other  to  ensure  exact  recovery.  For  example,  if  E  shares  the  same  UoS  structure  as  L. 
then  a  segmentation  of  L  and  E  would  be  impossible  without  further  prior  information.  In  other  words,  if  all  perturbations 
caused  by  E  do  not  alTect  the  UoS  structure  of  L.  we  then  cannot  distinguish  E  from  L  only  using  the  information  of  their 
geometric  space. 

Inspired  by  this  intuition,  we  establish  a  sulhcient  condition  of  exact  decomposition  of  L  and  E  as  follows: 

Theorem  1.  (L.  E)  can  be  exactly  recovered  by  solving  Egn(3)  with  A  >  0.  i.e.(L.  E)  =  (L.  E),  if  for  any  Z  of  the  same 
dimension  of  L  and  L  H-  Z  €  SRk> 

||PotZ||i-||Po.Z||i>&, 

where  k  is  the  number  of  suhspaces.  and  W  =  VVi(L). 

The  proof  of  Theorem  1  is  presented  in  Appendix  A-B.  In  particular,  this  theorem  gives  the  “incoherence”  condition  between 
L  and  E  to  guarantee  an  exact  recovery.  A  given  L  defines  a  space  of  Z  such  that  L  +  Z  €  SBk-  Irt  ifiis  case,  Z  also  has 
a  low  dimensional  structure,  since  when  we  combine  L  and  Z,  the  summation  is  still  in  SBk^  Furthermore,  the  inequality  in 
Theorem  1  states  that  all  Z  in  that  space  defined  by  L  should  be  fairly  dilTerent  from  E,  in  the  sense  that  nonzero  elements 
in  Z  concentrate  on  the  complement  of  the  support  of  E, 

In  practice,  as  we  will  see  in  the  experimental  section,  the  sparse  errors  typically  reside  in  a  space  distant  from  the  data 
space,  as  sparse  errors  generally  lack  coherent  structures  found  in  high  dimensional  data, 

B.  Geometric  interpretation  of  subspace  detection  property 

After  solving  for  L  and  E,  the  problem  of  finding  sparse  coefficients  W  is  then  equivalent  to  subspace  clustering 
without  .sparse  errors.  Specifically,  W  is  determined  by  the  problem  defined  in  VVi(L)  (Definition  4).  However,  it  would 
be  fundamentally  dilhcult  to  constrain  W  in  BMk  in  the  procedure  of  optimization.  On  the  other  hand,  if  we  can  get  rid 
of  this  constraint  without  affecting  the  solution  of  >Vi(L),  then  the  problem  will  degenerate  to  a  classical  li  minimization 
problem  with  linear  constraint. 

We  next  ftKus  on  the  constraint  W  €  in  Wi(L).  Intuitively,  since  the  sparsity  of  W  is  bounded  below  by  max{6^}/n, 
where  bi  is  the  size  of  each  block,  we  can  see  that  the  set  of  sparse  matrices  and  BAIk  overlap.  A  natural  question  would  then 
be  under  what  condition  we  can  simply  use  /i  minimization  to  obtain  an  accurate  W,  i.e.  reflecting  the  underlying  subspace 


structure. 
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In  a  more  formal  way.  if  W  is  the  solution  of  the  following  problem, 


min  II Will  s.t.  XW  =X.W^^  =  0, 


(4) 


and  C  ,siipp{A)  €  BKIk,  then  the  solution  of  Eqn(4)  is  the  same  as  that  with  a  constraint  X  €  BKIk,  where 


1  if  Xi  and  Xj  are  in  the  same  subspace. 
0  otherwise. 


(5) 


In  [18J,  Theorem  2.5  guarantees  the  correctness  of  the  subspace  segmentation,  which  they  call  /i-subspace  detection  property. 
Intuitively,  if  the  “subspace  incoherence”  for  each  subspace  is  high,  and  the  distribution  of  points  in  each  subspace  is  not  skewed, 
then  Wij  ^  0  if  and  only  if  and  Xj  are  in  the  same  subspace.  In  this  section,  we  provide  additional  insight  to  this  problem. 

Specifically,  we  focus  on  each  Xj  in  X.  and  rewrite  Eqn(4)  as  follows  for  each  Xj. 


min  ||w||i  s.t.  X.^w  =  Xj,  (6) 

w  ' 

where  X_t  is  the  matrix  X  with  a  missing  column  Xj. 

We  next  give  the  /i  subspace  detection  property  as  [18].  and  then  provide  a  sufficient  condition  for  the  subspace  detection 
property  to  hold. 

Definition  5.  (Ij  suh.^pace  detection  property)  Let  dataset  X  He  in  a  union  of  suhspaces  5  =  5^  U  5^  U  . .  .5*^.  For  each 
Xi  €  X,  the  optimal  solution  of  Eqn(6)  is  w^.  Then  we  say  the  pair  (X,  S)  satisfies  the  suh.space  detection  property  if  and 
only  if  supp{wi)  C  {;|x,,Xj  €  5^}. 

Before  presenting  our  main  result,  we  would  like  to  discuss  the  potential  factors  on  this  issue.  On  one  hand,  given  the 
dataset  X  in  a  union  of  subspaces,  it  would  be  easier  to  segment  X  correctly  if  the  ’’distance”  between  any  two  subspaces 
were  stifficiently  large.  In  the  extreme  case,  if  two  subspaces  overlap,  then  the  identity  of  the  points  in  the  overlap  region 
would  not  be  well-defined.  On  the  other  hand,  the  density  of  samples  in  each  subspace  is  important,  in  the  sense  that  we 
need  a  subspace  to  be  well-represented  by  the  associated  samples,  so  that  we  do  not  create  “false  outliers”  by  insufficient 
sampling.  For  example,  in  a  two-dimensional  subspace  with  ^  x  —  y  cartesian  coordinate  system,  if  we  somehow  only  have 
one  sample  p  along  the  y  coordinate,  and  all  the  rest  along  x  coordinate,  then  without  knowing  the  underlying  structure,  it 
would  be  legitimate  to  assume  that  p  is  an  outlier,  and  is  not  able  to  be  represented  by  other  samples,  and  the  rest  of  the  data 
fall  on  a  one-dimensional  subspace.  We  therefore  would  expect  a  sufficient  condition  to  include  both  of  the  above  conditions: 
all  subspaces  keeping  a  “safe  distance”  from  each  other,  and  each  having  enough  samples  on  each  of  them. 

In  particular,  the  distance  between  two  subspaces  can  be  measured  by  the  first  principal  angle  between  them  as 
To  provide  some  intuition  here,  if  B{Si.Sj)  =  0.  then  Si  and  Sj  overlap;  and  if  B{Si.Sj)  =  Tt/2,  we  have  Si  ±  Sj.  On  the 
other  hand,  to  measure  the  sufficiency  of  samples,  we  need  to  first  define  the  data  density  in  an  appropriate  way.  We  hence 
next  introduce  concepts  related  to  the  measure  of  data  sufficiency. 
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Definition  6.  (Conic  Hull  [1])  The  conic  hull  of  a  set  C  is 

cone.{C)  =  {^ixi  H - hc^k'^kl^i  £  C,  >  0,i  =  1, . . . ,  fe} 

ll  is  worth  noiing  lhai  cone{C)  is  also  the  smallest  convex  cone  that  contains  C  [IJ. 

We  then  give  the  A-density  condition  to  measure  the  data  sufficiency  as  follows. 

Definition  7.  (A-densiry  condition)  For  all  xj  €  if  there  exists  an  affine  independent  set  {xj,^ , . . .  .xj,  C  ±X^ 

such  that  xj  €  Cj'  =  cone{x\^^ , . . .  .xj..  ),  and  the  minimal  circumscribed  sphere  in  of  , . . . ,  }  centered  at  Oi  obeys 

0(Oi,x^  )  <  A,  j  =  1, . . . ,  then  we  say  that  X^  in  satisfies  the  A-denstiy  condition. 

Our  main  result  now  stated  as  the  following  theorem. 

Theorem  2.  A  dataset  X  of  unit’length  points  which  lie  in  a  union  of  subspaces  S  =  S^US^V.  >  >  satisfies  the  li  subspace 
detection  property  ?/Vx  €  X,  x  satisfies  the  A’density  condition,  and  for  any  pair  of  and  Sf  G(S^,S^)  >  A.  where 
0(5*,  is  the  first  principal  angle  between  5*  and  S^. 

The  proof  is  presented  in  Appendix  A-C.  The  interpretation  of  Theorem  2  is  straightforward:  the  angle  between  subspaces 
is  bounded  below  by  A,  which  is  exactly  our  measure  for  the  data  density,  the  maximum  “size'’  of  the  smallest  conic  hull 
containing  each  sample.  Specifically,  if  we  have  a  higher  density  of  samples,  which  means  we  have  a  clearer  image  of  each 
subspace,  then  the  segmentation  of  the  union  of  subspaces  can  be  accurately  carried  out  with  a  more  su'ingent  condition,  i.e. 
the  angle  between  subspaces  can  be  smaller.  On  the  other  hand,  if  the  samples  are  sparse  and  far  from  each  other,  it  would  be 
more  difficult  to  recover  the  underlying  structure,  and  therefore  we  need  the  union  of  subspaces  to  be  widely  separated,  i.e.  a 
larger  principal  angle. 

C.  An  approximate  solution  via  sparse  modeling 

Under  the  conditions  slated  in  Theorem  2.  we  can  subsequently  modify  Wi(L)  into  a  convex  function  and  define  it  in  a 
connected  domain  by  dropping  the  constraint  W  £  Specifically,  we  have 

VVi(L)  =  inirL||W||i,  s.f.  L  =  LW. =  0.  (7) 

Substituting  Wi(L)  by  VVi(L)  in  Eqn(3)  allows  us  to  relax  the  constraints  of  Eqn(3)  and  directly  work  on  the  following 
problem. 


min||W||i  +  A||E||i.  (8) 

s.tX  =  L  +  E.  L  =  LW.  Wii  =  0. 


Other  than  posing  this  problem  as  a  recovery  and  clustering  problem,  we  may  also  view  it  from  a  dictionary  learning  angle. 
Note  that  the  constraint  X  =  L  +  E  may  be  rewritten  as  X  =  LW  -h  E,  to  therefore  reinterpret  the  problem  of  finding  L 
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and  E  a.s  a  diciionary  learning  problem.  In  addition  to  the  sparse  model,  atoms  in  dictionary  L  are  brought  from  data  samples 
with  sparse  variation.  It  may  hence  be  seen  as  a  generalization  of  [6]  in  the  sense  that  we  not  only  pick  representative  samples 
from  the  given  data  set  using  a  /i-norm,  but  also  adapt  the  representative  samples  so  that  they  can  “fix”  themselves  and  hence 
be  robust  to  sparse  errors. 


IV.  ALGORITHM:  ROBUST  SUBSPACE  RECOVERY  VIA  Bl-SPARSITY  PURSUIT 


Obtaining  an  algorithmic  solution  to  Eqn(8)  is  complicated  by  the  bilinear  term  in  the  constraints  which  yield  a  a  non-convex 
optimization  functional.  In  this  section,  we  leverage  the  successes  of  alternating  direction  methtxl  (ADM)  [11]  and  linearized 
ADM  (LADM)  [12J  in  large  scale  sparse  representation  problem,  and  focus  on  designing  an  adapted  algorithm  to  approximate 
the  minimum  of  Eqn(8). 

Our  method,  referred  to  herein,  by  robust  subspace  recovery  via  bi-sparsity  pursuit  (RoSuRe),  is  based  on  linearized 
AD  MM  [12].  Concretely,  we  pursue  the  sparsity  of  E  and  W  alternately  until  convergence.  Besides  the  effectiveness  of 
ADMM  on  /i  minimization  problems,  a  more  profound  rationale  for  this  approach  is  that  the  augmented  Lagrange  multiplier 
(ALM)  method  can  address  the  non-convexity  of  Eqn(8)  [14]  [16j.  Although  there  is  no  guarantee  on  the  convergence  of 
general  non-convex  problems.  Theorem  4  in  [16]  states  that  under  the  ALM  setting,  the  duality  gap  may  be  zero  when  certain 
conditions  are  satisfied.  We  show  the  zero  duality  gap  property  of  Problem  Eqn(8)  in  Appendix  B,  We  can  then  approximate 
the  optimizer  by  solving  the  dual  problem,  with  an  appropriate  augmented  Lagrange  multiplier. 


Algorithm  1  Subspace  Recovery  via  Bi-Sparsiiy  Pursuit  (RoSuRe) 

Initialize;  Data  matrix  X  G  A,  p,  ??i.  r{2 

while  not  converged  do 
Update  W  by  linearized  soft-thresholding 
Li:+1  =  X  —  Efc, 

i‘ni  \ 

wti  =  0. 

Update  E  by  linearized  soft- thresholding 

Efc+i  =  T_l_  (Ej..  + 

MV2  V  ^  J 

Update  the  lagrange  multiplier  Y  and  the  augmented  lagrange  multiplier  p 

=  Yk  +  ML(Li:+lWfe+l  —  hk+l) 

=  PPJc 

end  while 


Specifically,  substituting  L  by  X  —  E,  and  using  L  =  LW,  we  can  reduce  Eqn(8)  to  a  two-variable  problem,  and  hence 
write  the  augmented  Lagrange  function  of  Eqn(8)  as  follows. 


L(E,  W,  Y,  =  A||E||i  +  ||W||  1  -h  {LW  -  L,  Y) 

+  ^||(X-E)W-(X-E)|||-, 


(9) 
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where  Y  is  ihe  Lagrange  multiplier.  Letting  W  =  I  —  W,  we  alternatively  update  W  and  E, 


Wfc+I  =  aigmiii  II Will  +  {L*+1  W  -  L^+i,  Y*) 
w 

+  ^||Li;+lW  —  + 


E^+1  =axgininA||E||i  +  ((E-X)Wfe+i,Yfc: 

+  |||(E-X)W,+i||2.. 


(10) 


(11) 


The  solution  of  Eqn(lO)  and  Eqn<l  1)  can  be  well  approximated  in  each  iteration  by  linearizing  the  augmented  Lagrange  term 

[12], 


Wfe+,  =r^  Wfe 


Efc+i  =  T_i_  Et  + 


k+l 


V2 


(12) 

(13) 


where  ^71  >  ||L||2,  ??2  5  ||W||2,  and  7^(')  is  a  soft-thresholding  operator. 
In  addition,  the  Lagrange  multipliers  are  updated  as  follows. 


Y*+1  =  Yfe  +  k+l  —  Lfc+i) 

A4A,+i  =  pilk 


(14) 

(15) 


V.  Experiments  and  Validation 


A.  Experiments  on  Synthetic  Data 


Section  III  discusses  the  necessary  condition  to  recover  a  data  structure  by  solving  Eqn(l).  In  this  section,  we  hence 
empirically  investigate  the  viability  extent  of  RoSuRe  with  various  conditions.  The  recovery  results  are  compared  with  Robust 
PCA  [2]  using  the  method  presented  in  [llj  and  sparse  subspace  clustering  using  the  algorithm  in  [8]. 


iC)W  RoSuPe 


(f)|Lo  — 


Fig.  1.  An  example  of  robust  subspace  exact  recovery. 


The  data  matrix  L  is  fixed  to  be  a  200  x  200  matrix,  and  all  data  points  are  uniformly  sampled  from  a  union  of  5  subspaces. 
The  norm  of  each  sample  is  normalized  to  1.  10%  elements  of  each  column  in  sparse  matrix  Eo  are  randomly  selected  to  be 


nonzeros.  The  value  of  each  nonzero  element  in  Eo  then  follows  a  gaussian  distribution  with  mean  0.5  and  variance  0.5.  Fig.  I 
shows  one  example  of  the  exact  recovery  and  clustering.  Note  that  (Li?oSui?€,  EijoSufle)  and  (Lo,  Eo)  are  almost  identical,  and 
^RoSiiRe  shows  dear  clustering  properties  such  that  Wij  ^  0  when  1^,1^  are  not  in  the  same  subspace.  In  Fig. 2  we  compare 
the  RoSuRe  peri'ormance  to  that  of  Robust  PC  A.  and  demonstrate  the  significant  improvement  using  our  proposed  method. 


Fig.  2,  Comparison  with  Robust  PC  A. 


Fig.3  is  the  overall  recovery  results  of  RoSuRe,  robust  PC  A  and  SSC.  White  shaded  area  means  a  lower  error  and  hence 
amounts  to  exact  recovery.  The  dimension  of  each  subspace  is  varied  from  1  to  15,  and  the  sparsity  of  S  from  0.5%  to  15%. 
Each  submatrix  L/  =  X/Yf  with  nxd  matrices  X/  and  Y/.  are  independently  sampled  from  an  i.i.d  normal  distribution.  The 
recovery  error  is  measured  as  err(L)  =  ||Lo  —  L||f /||Lo||j7.  We  can  see  a  significantly  larger  operational  range  of  RoSuRe  in 
comparison  to  those  of  robust  PC  A  and  SSC.  The  key  to  RoSuRe  better  performance  than  robust  PC  A  is  due  to  the  underlying 
data  model  assumption.  Concretely,  when  the  sum  of  the  dimension  of  each  sub  space  is  small,  the  UoS  model  degenerates  to  a 
’’low-rank  +  sparse”  model,  which  suits  well  robust  PCA.  On  the  other  hand,  when  the  dimension  of  each  subspace  increases, 
the  overall  rank  of  L  tend  to  be  accordingly  larger  and  hence  the  low  rank  model  may  not  hold  anymore.  Since  RoSuRe  is 
designed  to  fit  UoS  model,  it  can  recover  the  data  structure  over  a  wider  rank  range.  The  SSC  method  specifically  satisfies 
the  modeling  condition  when  only  a  small  portion  of  data  are  outliers.  The  case  where  most  of  the  data  is  corrupted  makes  it 
very  difficult  to  reconstruct  samples  by  other  corrupted  ones. 


10  20  30  10  20  30 

(a)RoSuRe  (b)Robust  PCA 


(c)  SSC 


Fig.  3,  Overall  recovery  results  of  RoSuRe  and  Robust  PCA.  (0  0.2)  is  mapped  lo  (1  0)  of  grayscale  image 


(a)Background  (b)Foreground  (c) Original  frame 


(d) Background  (e) Foreground  (f) Original  frame 

Fig.  4,  Background  subiraciion  on  traffic  videos  (siaiic  camera) 

B.  Experiments  on  Computer  Vision  Problems 

Since  the  UoS  model  has  been  intensively  researched  and  successfully  applied  to  many  computer  vision  and  machine  learning 
problems  [13]  [8]  [4]»  we  expect  that  our  model  accordingly  address  these  problems.  Here,  we  next  present  experimental  results 
of  our  meih^xl  in  video  background  subtraction  and  face  clustering  problems,  as  exemplars  of  its  promising  potential. 

}}  Video  background  subtraction:  Surveillance  videos  can  be  naturally  modeled  as  UoS  model  due  to  their  relatively  static 
background  and  sparse  foreground.  The  power  of  our  proposed  UoS  model  lies  in  coping  with  both  a  static  camera  and  a 
panning  one  with  periodic  motion.  Here  we  test  our  method  in  both  scenarios  using  surveillance  videos  from  MIT  traffic 
dataset  [20] .  In  Fig .4,  we  show  the  segmentation  results  with  a  static  background.  For  the  scenario  of  a  ’’panning  camera”, 
we  generate  a  sequence  by  cropping  the  previous  video.  The  cropped  region  is  swept  from  bottom  right  to  top  left  and  then 
backward  periodically,  at  the  speed  of  5  pixels  per  frame.  The  results  are  shown  in  Fig. 5.  We  can  see  that  the  results  in  the 
moving  camera  scenario  are  only  slightly  worse  than  the  static  case. 


(a)Background  (b)Foreground  (c)Original  frame 


fi 


(d) Background  (e) Foreground  (0 Original  frame 

Fig.  S,  Background  subiraciion  on  traffic  videos  (panning  camera) 

More  interestingly,  the  sparse  coefficient  matrix  W  provides  important  information  about  the  relations  among  data  points, 
which  potentially  may  be  used  to  cluster  data  into  individual  clusters.  In  Fig.  6(a),  we  can  see  that,  for  each  column  of  the 
coefficient  matrix  W.  the  nonzero  entries  appear  periodically.  In  considering  the  periodic  motion  of  the  camera,  we  essentially 
mean  that  every  frame  is  mainly  represented  by  the  frames  when  the  camera  is  in  a  similar  position,  i.e.  a  similar  background, 
with  the  foreground  moving  objects  as  sparse  perturbations.  We  hence  permute  the  rows  and  columns  of  W  according  to  the 
position  of  cameras,  as  shown  in  Fig.  6(b).  A  block-diagonal  structure  then  emerges,  where  images  with  similar  backgrounds 
are  clustered  as  one  subspace. 


J2 


(a)  (b) 


Fig.  6.  Coefficieni  matrix  W  (a)  without  rearrangement  according  to  the  position  of  the  camera  (b)  with  rearrangement  according  to  the  position  of  the 
camera 


Algorithm 

LSA 

LRR 

ssc 

RoSuRe 

2 -subjects  Mean 

38.20 

2.54 

1.86 

0.71 

Median 

47.66 

0.78 

0.00 

0.39 

5 -subjects  Mean 

58.02 

6.90 

4.31 

3.24 

Median 

56.87 

5.63 

2.50 

1.72 

10- subjects  mean 

60.42 

22.92 

10.94 

5.62 

Median 

57.50 

23.59 

5.63 

5.47 

Clustering  error  (%)  on  the  Extended  Yale  Face  Database  B  compared  to  state-of-the-art  methods  (81  (13)  [22] 


2)  Face  clustering  under  various  illumination  conditions:  Recent  research  on  sparse  models  has  unveiled  that  a  parsimonious 
representation  may  be  a  key  factor  for  classification  [4]  [9].  Indeed,  the  sparse  coefficients  pursued  by  our  method  shows 
clustering  features  in  experiments  of  both  synthetic  and  real-world  data.  To  further  explore  the  applicability  of  our  method,  we 
evaluate  the  clustering  performance  on  the  Extended  Yale  face  database  B  [lOJ,  and  compare  our  results  to  the  state-of-the-art 
methods  [22J  [13]  [8]. 


Fig.  7.  Sample  face  images  in  Extended  Yale  face  database  B 


The  database  includes  cropped  face  images  of  38  dilTerent  people  under  various  illumination  conditions.  Images  of  each 
person  may  be  seen  as  data  points  from  one  subspace,  albeit  heavily  corrupted  by  entries  due  to  different  illumination  conditions, 
as  shown  in  Fig.  7.  In  our  experiment,  we  adopt  the  same  setting  as  [8].  such  that  each  image  is  downsampled  to  48  x  42  and 
is  vectorized  to  a  2016-dimensional  vector.  In  addition,  we  use  the  sparse  coefficient  matrix  W  from  R^>SuRe  to  formulate  an 
allinity  matrix  as  A  =  W  -h  W^.  where  W  is  a  thresholded  version  of  W.  The  spectral  clustering  method  in  [15]  is  utilized 
to  determine  the  clusters  of  data,  with  affinity  matrix  A  as  the  input. 
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Tlid  value  of  X 


Fig.  8.  Gusiering  Accuracy  vs  The  value  of  A 


We  compare  ihe  clustering  performance  of  RoSuRe  with  the  state-of-the-art  methods  such  as  local  subspace  analy- 
sis{LSA)  [22],  sparse  subspace  clustering  (SSC)  [8],  and  low  rank  re  presen  lation(LRR)  [13  j.  The  best  performance  of  each 
method  is  referenced  in  Table  I  for  comparison.  As  shown  in  the  table,  RoSuRe  has  the  lowest  mean  clustering  error  rate  in 
all  three  settings,  i.e.  2  subjects,  5  subjects  and  10  subjects.  In  particular,  in  the  most  challenging  case  of  10  subjects,  the 
mean  clustering  error  rate  is  as  low  as  5.62%  with  the  median  5.47%.  Additionally,  we  show  the  robustness  of  our  method 
with  respect  to  A  in  a  10-subject  scenario.  In  Fig.  8.  the  correlation  between  the  value  of  A  and  the  cluster  accuracy  maintains 
above  98%  with  A  varying  from  500  to  15000. 

In  Fig.  9,  we  present  the  recovery  results  of  some  sample  faces  from  the  10-subject  clustering  scenario.  In  most  cases,  the 
sparse  term  E  compensates  the  information  missing  caused  by  lightning  condition.  This  is  especially  evident  when  the  shadow 
area  is  small,  i.e.  a  sparser  support  of  error  term  E,  we  can  see  a  visually  perl’ect  recovery  of  the  missing  area.  This  result 
validates  the  effectiveness  of  our  method  to  solve  the  problem  of  subspace  clustering  with  sparsely  corrupted  data. 


Fig.  9,  Recovery  resulu  o/  human  face  images.  The  three  rows  from  lop  to  boiiom  are  original  images,  ihe  components  E.  and  the  recovered  images, 
respectively. 


VI.  Conclusion 

We  have  proposed  in  this  paper  a  novel  approach  to  recover  underlying  subspaces  of  data  samples  from  measured  data 
corrupted  by  general  sparse  errors.  We  formulated  ihe  problem  as  a  non-convex  optimization  problem,  and  a  necessary  condition 
of  exact  recovery  is  proved.  We  also  designed  an  effective  algorithm  named  RoSuRe  to  well  approximate  the  global  solution 
of  ihe  optimization  problem.  Furthermore,  experiments  on  both  synthetic  data  and  real-world  vision  data  are  presented  to 
demonstrate  a  broad  range  of  applications  of  our  method. 

Future  work  may  include  several  aspects  across  computer  vision  and  machine  learning.  It  would  first  be  interesting  to 
understand  and  extend  this  work  from  a  dictionary  learning  angle,  to  learn  a  feature  set  for  high  dimensional  data  representation 


and  recognition.  Additionally,  a  necessary  condition  for  exact  recovery  has  been  proved  in  this  paper.  Exploring  a  sufficient 
condition  is  not  only  theoretically  interesting,  but  also  helpful  for  better  understanding  the  problem. 


APPENDIX  A 
Proofs 


A.  Proof  of  Lermna  I 

At  first,  we  rewrite  the  objective  function  in  Eqn(2)  as 

/(L.E)  =  ^^  +  ||E||o. 


(16) 


It  is  clear  that  this  will  not  change  the  minimum  value.  In  addition,  we  assume  that  there  exists  L  €  SRk,  otherwise  the 
statement  would  be  trivial,  since  Eqn(l)  would  be  not  be  feasible,  and  the  value  of  the  objective  function  in  Eqn(2)  would  be 
infinite. 

Let  (L,E)  be  a  global  minimizer  of  Eqn(2),  then  L  €  SRk-  If  3  E',  such  that  ||E'||o  <  ||E||o  nnd  L'  =  X  —  E'  €  SRk^ 
we  have 


/(L',E')  =  ||E'||o  +  l  +  ^^^-l 
<l|E||c.^-l. 


Since  {L,  E)  is  a  global  minimizer.  /(L.  E)  <  /(L'.E').  Combining  the  latter  with  Eqn(17)  yields, 


0</(L',E')-/(L,E)<^Ml)^_, 


(17) 


(18) 


Then  it  follows  that 


A  <  Wo(L')  -  Wo(L).  (19) 

Note  that  when  L  €  SRk.  0  <  VVo(L)  <  where  n  is  the  number  of  columns  of  L.  Therefore,  letting  \>n^  will  violate 
Eqn(19)  since 

A  >  >  Wo(L')  -  Wo(L).  (20) 

Hence,  with  A  >  E  is  also  a  solution  of  Eqn(l).  Lemma  1  is  proved.  □ 

B.  Proof  of  Theorem  i 

First,  for  any  other  feasible  solution  (L'.E').  L'  must  be  still  in  SRk-  h  i^^  equivalent  to  say.  that  for  any  perturbations  on 
L.  Z  =  L'  —  L,  we  have  L  +  Z  €  SRk- 


We  next  show  that  Z  needs  to  satisfy  the  following  condition  to  guarantee  the  exact  recovery  of  (L.  E)  via  solving  Eqn(3): 
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then  for  (L'.E')  =  (L  +  Z.E-  Z).  /(L,E)  <  /(L',E'). 
Consider 


/(L'.E')-/{L.E)  =  ||E-Z||:-||E||, 

,  IIW'II,  iiwiii 

A  A  ’ 

by  using  the  disjoint  property  of  Qe  and  fij..  we  have 


it  then  follows  that 


E  -  Z|U  -  ||E||i  =  ||E  -  Pn,Z  -  P^.ZIIi  -  ||E||i 

=  ||E-Pn.Z||i  +  r^,Z||i-||E||i 
>||E||i-||Po.Z||i  +  ||P^,,Z||i-||E||i 

=  ||Pa,Z||i-||Po,Z||, 

.  l|W||i 
-  A  ’ 


hence  proving  Theorem  L 


(21) 


(22) 


(23) 

□ 


C  Proof  of  Theorem  2 

Let  X  represent  the  dataset  with  unit-length  data,  and  S  =  5^  U  5^  U  -  •  U  5^  its  underlying  structure  as  a  union  of 

subspaces.  Consider  the  partition  of  X  corresponding  to  5  is  X  =  [X^X^ _ ,X^],  then  for  any  Xj  €  X-^,  there  is  a  linear 

combination  of  other  samples  in  X-^  represent  Xt  as  Xj  =  We  therefore  have  a  feasible  solution  for  the 

following  problem. 


w‘  =  argmin  ||w||i 

w  ' 

s.t.  Xi^w  =  Xi.  (24) 

Then  the  dual  problem  of  Eqn(24)  as  follows  also  has  at  least  one  feasible  point, 

max(xi.A)  s.t.  ||(Xij)^A||oc,  <  1.  (25) 

Let  the  support  of  w*  be  Qo*  and  consider  the  dual  vector  A*  satisfying 

A*  =  argmin  ||A||2 

A 

s.t.  A  =  ..gn(w^„),  ||(X^q.)^A||^  <  1. 


(26) 
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ll  is  wodh  noting  that  Eqn(24)  and  Eqn(26)  imply  that  Xj  €  ).  Additionally,  there  are  some  properties  of  A* 

which  are  crucial  to  the  proof. 

First,  let  A*  =  AJ  +  A*j..  Since  A*  is  the  feasible  point  with  the  least  I2  norm,  and  =  0,  (X^r)^Agx  =  0, 

we  have  A*^  =  0,  and  therefore  A*  €  5;. 

Furthermore,  the  first  constraint  in  Eqn(26)  can  be  rewritten  as 

|(x^.A*)|  =  l,||x||2  =  l.VxeX^„,  (27) 

which  implies  that  A*  passes  the  origin  of  the  circumscribed  sphere  of  X^^^.  where  X^^^  C  and  (x^,  A*}  =  1.  V7  €  Qo- 

Now  consider  the  A-density  condition  for  Xj,  it  follows  that 

e(A\x)<  A,Vx€X^g„.  (28) 

This  in  combination  with  ||x||2  =  1,  yield 


||A‘||2<1/cos(A).  (29) 

We  subsequently  would  use  A*  and  w’  to  further  constrain  the  optimal  solution  of  Eqn(6). 

In  particular,  we  have  the  following  lemma  from  [18]  using  the  dual  certificate  technique. 

Lemma  2.  Consider  there  exists  c  €  which  is  feasible  for  the  primal  problem 

min||z||i  s.t.  Az  =  y,  (P) 

S 

and  the  support  of  c  is  C  Q,  then  if  there  is  dual  vector  v  satisfying 

'^llcc  2  li  ll-A-g.  v||oo  <  1* 

all  optimal  solutions  z*  to  (P)  have  Zg.  =  0. 

We  next  construct  a  primal  feasible  point  for  Eqn(6)  by  w*.  Consider  the  index  set  of  X*^  in  X  is  Q,  then  w  satisfying 
wg  =  w*,  Wg  =  0  is  also  feasible  for  Eqn(6).  Additionally,  since  Xg^,  =  X^^.Xg^yg  =  X^r,  A*  has  the  following  property 
from  Eqn(26), 

Xj,  A*  =  .sgn(w^„),  HX^.^qA*!!^  <  1.  (30) 

Then  according  to  Lemma  2.  if  we  further  have  ||XgrA*||ao  <  L  and  in  combination  with  the  condition  that  wqc  =  0,  all 

optimal  solutions  w  of  Eqn(6)  satisfy  wqc  =  0.  essentially  imply  the  /i  subspace  detection  property. 

Consider  that  the  principal  angle  between  any  pair  of  subspaces  is  larger  than  A.  we  have 

||^s-»x||2  <  ||x||2  cos(A)  =  cos(A),Vx  €  Xg. 


(3i) 


(7 


Conbined  wiih  Eqn(29),  for  all  x  €  Xq.  ,  il  follows  lhai 

Kx,A*)|  =  KPs.x.A*)|<||Ps,x||2||A*||2 

and  thus  proving  Theorem  2. 


Appendix  B 

Zero  Duality  Gap  of  the  Dual  Problem 

In  Section  IV,  we  extended  our  algorithm  RoSuRe  to  address  Problem  (8).  Essentially,  our  algorithm  can  be  seen  as  a  dual 
method,  which  relies  on  solving  the  dual  problem  instead  of  the  primal  one.  However,  as  we  mentioned  in  Section  IV,  a 
duality  gap  usually  exists  for  general  non-con  vex  programming.  We  then  use  the  framework  of  augmented  Lagrange  melh^xl  to 
’’convexify”  the  Lagrange  function  of  (8).  To  substantiate  our  motives,  in  this  section  we  would  like  to  show  the  zero  duality 
gap  between  the  primal  problem  (8)  and  the  associated  ’’augmented”  dual  problem. 

First,  consider  the  nonlinear  programming  problem  with  equality  constraints  in  the  following  general  form, 

min/(j:)  s.t.  k{x)  =  0,x  €  Q,  (P) 

then  the  primal  function  associated  with  (P)  is  defined  as 

p{z)  =  inf{/(x)  :  h{x)  <  z,  —h{x)  <  z,x  €  (33) 

In  addition,  the  augmented  Lagrange  function  is  defined  as 

Lix,y.fi)  =  f{x)  +  {y,k{x))  +  e  Q,  (34) 

which  yields  the  dual  problem  of  (P)  as  follows, 

where  g{y,fi)  =  inf  L{x,y).  (D) 

x€Sl 

The  augmented  Lagrange  method  for  non-convex  programming  is  extensively  discussed  in  [16],  and  a  sufficient  and  necessary 
condition  for  a  zero  duality  gap  is  further  proved.  In  particular,  two  conditions,  i.e.  the  quadratic  growth  condition  and  the 
stability  of  degree  0,  are  critical  for  a  non-convex  problem  to  be  solved  by  a  dual  method.  We  therefore  first  give  the  definition 
of  these  two  conditions,  and  then  show  that  Problem  (8)  satisfies  them. 

Definition  8.  (Quadratic  Growth  Condition)  We  say  that  (P)  satisfies  the  quadratic  growth  condition  if  for  certain  real  number 

L{x,0,fi)  =  f{x)  +  l^\\H{x)f  >  n.  (35) 

Definition  9.  (Stable  of  degree  k)  If  there  is  an  open  neighborhood  U  of  the  origin  of  and  a  function  uj  .V  R  of  class 
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such  that  the  primal  function  p{z)  of(P)  satisfies  the  following  condition: 

p{z)  >  u{z),  'iz  €  U,  with  p(0)  =  cj(0), 


then  (P)  is  (lower)  stable  of  degree  k. 

Lemma  3.  The  associate  primal  function  of  (S)  satisfies  the  quadratic  growth  condition  and  is  stable  of  degree  0. 

Proof.  We  firsi  show  thai  ihe  primal  function  p{z)  satisfies  the  quadratic  growth  condition.  Note  that  the  quadratic  growth 
condition  holds  if  f{x)  is  bounded  below  on  Q.  In  (8),  f{x)  =  ||W||i  +  A||E||i  >  0.  and  thus  the  associated  p(2)  has  a  lower 
bound  on  Q. 

We  next  show  p(z)  is  stable  of  degree  0.  First  of  all,  the  stability  of  degree  0  is  equivalent  to  the  following  condition  [16]: 

p(0)  =  lim  iiifpfz)  >  — oo  (36) 

z-*0 

Then  constructing  a  compact  set  including  p(0)  would  suffice  to  (36).  Specifically,  a  sufficient  condition  to  (36)  may  be  as 
follows:  Q  is  closed,  h{x)  is  continuous,  and  for  some  z  €  and  C  >  infp(2),  the  set 

S  =  {xen  \  f{x)  <C,-z<  k{x)  <  z} 

is  compact. 

In  problem  (8),  Q  =  {(W.  E)  e  x  =  0}  is  closed,  and  h{x)  is  obviously  continuous.  To  check  the 

compactness  of  5,  let  C  >  A||X||i.  It  is  easy  to  see  that  (0,  X)  is  a  feasible  point  in  the  union  of  compact  sets  5i  =  {x  € 
ft  \f(x)  <  C)  and  5^2  =  {^|  ~  2  <  h{x)  <  z}.  Then  S  =  Si  d  S2  is  also  a  compact  set.  We  therefore  have  the  conclusion 
that  p{z)  of  (8)  is  stable  of  degree  0.  □ 

We  finally  have  the  sufficient  condition,  i.e.  Lemma  3  to  show  the  zero  duality  gap  of  (P)  and  (D),  given  the  theorem  proved 
in  [16]: 

Theorem  3.  The  duality  equation  of(P) 

iiif(P)  =  sup(D) 

holds,  if  and  and  only  if  (P)  satisfies  the  quadratic  condition  and  is  stable  of  degree  0. 
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